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ARTICLE INFO ABSTRACT 


BACKGROUND AND OBJECTIVES: The healthcare insurance industry faces a significant 
challenge predicting individuals’ insurance costs, which are based on complex parameters such 
as age and physical characteristics. Insurance companies categorize policyholders into high- 
risk and low-risk groups to manage risks and avoid potential losses. However, the accurate 
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Accepted 18 November 2023 estimation of costs for each individual can be a daunting task. By leveraging data science and 

machine learning techniques, insurance companies can improve their cost estimation accuracy 
Keywords: and better manage risks. This approach can help insurance companies to provide more 
Data mining accurate insurance coverage and pricing for individuals leading to higher customer satisfaction 


and lower financial losses. 
(METHODS: To address this challenge, a data science and machine learning-based approach 
hat uses ensemble learning to predict high-risk and low-risk individuals is used. The method 
; involves several steps including data preprocessing, feature engineering, and cross-validation 
Risk o evaluate the model’s performance. The first step involves preprocessing the data by cleaning 
it, handling missing values, and encoding categorical variables. The second step generates new 
eatures using feature engineering techniques such as scaling, normalization, and dimensionality 
reduction. Next, ensemble learning is used to combine multiple regression methods such as 
logistic regression, neural networks, support vector machines, random forests, LightGBM, and 
XGBoost. By combining these methods, the aim is to leverage their strengths and minimize 
heir weaknesses to achieve better prediction accuracy. Finally, the model’s performance is 
evaluated using cross-validation techniques such as k-fold cross-validation. These techniques 
help to validate the model’s accuracy and prevent overfitting. 
FINDINGS: The proposed approach achieves an AUC of 0.73 demonstrating its effectiveness in 
predicting high-risk and low-risk individuals. 
*Corresponding Author: CONCLUSION: In conclusion, the healthcare insurance industry can benefit greatly from data 
science and machine learning-based approaches. By accurately predicting high-risk and low-risk 
individuals, insurance companies can better manage risks and provide more accurate coverage 
Phone: +9831 37934500 and pricing for their customers. This can lead to the improvement of customer satisfaction and 
ORCID: 0000-0001-7805-6344 the reduction of financial losses for insurance companies. 
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Fig. 3: Initial dataset 
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